Performance of a Parallel Matrix Multiplication Routine on Intel iPSC/860

نویسندگان

Inge Gutheil

Werner Krotz-Vogel

چکیده

The performance of a parallel matrix-matrix-multiplication routine with the same functionality as DGEMM of BLAS3 was tested for different numbers of nodes on a 32-node iPSC/860. The routine was then tuned for maximum performance on this particular computer system. Small changes in the original code lead to substantially higher performance and in all tested configurations there is a critical matrix size n≈50.np, the number of processors, above which Intel's non-blocking isend is more efficient than the blocking csend. This shows that special tuning for a single machine pays off for large matrices. 1.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Computation Time In BMR

Figure 13: Running time of the assembly DGEMM routine vs that of the C routine of the S-method coupled with DGEMM on single processor. MINDIM=100 for the S-method. Strassen's algorithm has been presented and compared with other parallel matrix multiplication algorithms. On the Intel iPSC/860, the BMR-Strassen method coupled with assembly BLAS routines o ers the fastest approach to matrix multip...

متن کامل

Performance Experiments and Optimizations of PDE Sparse Solvers on Hypercubes,

In this report we present the results of experiments with the parallel sparse matrix solver of the Parallel Ellpack System. 1bree different hypercube parallel machines are used to compare and optimize its performance. After a brief description of the parnIlel sparse matrix solver and a presentation of the machine parameters and features. the measurements of performance of the sparse solver on t...

متن کامل

The Conjugate Gradient Method for Large Sparse Matrices on the Intel Ipsc/860 Hypercube

For large sparse unstructured matrices, the critical parts of the Conjugate Gradient method on the iPSC/860 are the inter-processor communications needed for the matrix-vector multiplication and the vector-updates. In this work several implementations are tested and discussed in search for an optimal algorithm. They di er in distribution of the matrix and the various vectors over the processorg...

متن کامل

Early Experience With the Intel iPsc/860 At Oak Ridge National Laboratory

This report summarizes the early experience in using the Intel iPSC/SSO parallel supercomputer at Oak Ridge National Laboratory. The hardware and software are described in some detail, and the machine’s performance is studied using both simple computational kernels and a number of complete applications programs.

متن کامل

A PERFORMANCE STUDY OF SPARSE CHOLESKY FACTORIZATION ON INTEL iPSC/860

The problem of Cholesky factorization of a sparse matrix has been very well investigated on sequential machines. A number of efficient codes exist for factorizing large unstructured sparse matrices, for example, codes from Harwell Subroutine Library [4] and Sparspak [7]. However, there is a lack of such efficient codes on parallel machines in general, and distributed memory machines in particul...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

Parallel Computing

دوره 20 شماره

صفحات -

تاریخ انتشار 1994

Performance of a Parallel Matrix Multiplication Routine on Intel iPSC/860

نویسندگان

چکیده

منابع مشابه

Computation Time In BMR

Performance Experiments and Optimizations of PDE Sparse Solvers on Hypercubes,

The Conjugate Gradient Method for Large Sparse Matrices on the Intel Ipsc/860 Hypercube

Early Experience With the Intel iPsc/860 At Oak Ridge National Laboratory

A PERFORMANCE STUDY OF SPARSE CHOLESKY FACTORIZATION ON INTEL iPSC/860

عنوان ژورنال:

اشتراک گذاری